Population Stratification in the Context of Diverse Epidemiologic Surveys Sans Genome-Wide Data
نویسندگان
چکیده
Population stratification or confounding by genetic ancestry is a potential cause of false associations in genetic association studies. Estimation of and adjustment for genetic ancestry has become common practice thanks in part to the availability of ancestry informative markers on genome-wide association study (GWAS) arrays. While array data is now widespread, these data are not ubiquitous as several large epidemiologic and clinic-based studies lack genome-wide data. One such large epidemiologic-based study lacking genome-wide data accessible to investigators is the National Health and Nutrition Examination Surveys (NHANES), population-based cross-sectional surveys of Americans linked to demographic, health, and lifestyle data conducted by the Centers for Disease Control and Prevention. DNA samples (n = 14,998) were extracted from biospecimens from consented NHANES participants between 1991-1994 (NHANES III, phase 2) and 1999-2002 and represent three major self-identified racial/ethnic groups: non-Hispanic whites (n = 6,634), non-Hispanic blacks (n = 3,458), and Mexican Americans (n = 3,950). We as the Epidemiologic Architecture for Genes Linked to Environment study genotyped candidate gene and GWAS-identified index variants in NHANES as part of the larger Population Architecture using Genomics and Epidemiology I study for collaborative genetic association studies. To enable basic quality control such as estimation of genetic ancestry to control for population stratification in NHANES san genome-wide data, we outline here strategies that use limited genetic data to identify the markers optimal for characterizing genetic ancestry. From among 411 and 295 autosomal SNPs available in NHANES III and NHANES 1999-2002, we demonstrate that markers with ancestry information can be identified to estimate global ancestry. Despite limited resolution, global genetic ancestry is highly correlated with self-identified race for the majority of participants, although less so for ethnicity. Overall, the strategies outlined here for a large epidemiologic study can be applied to other datasets accessible for genotype-phenotype studies but are sans genome-wide data.
منابع مشابه
Cryptic relatedness in epidemiologic collections accessed for genetic association studies: experiences from the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) study and the National Health and Nutrition Examination Surveys (NHANES)
Epidemiologic collections have been a major resource for genotype-phenotype studies of complex disease given their large sample size, racial/ethnic diversity, and breadth and depth of phenotypes, traits, and exposures. A major disadvantage of these collections is they often survey households and communities without collecting extensive pedigree data. Failure to account for substantial relatedne...
متن کامل“Equivalent Linear Composition” as an Efficient Stratification Factor in Multipurpose Surveys
Horticulture survey is a multi-purpose survey which is conducted ad hoc by Statistical Center of Iran (SCI). Availability of survey variables in the sampling frame suggests a multivariate stratification in each province based on its desired variables for acquiring a higher efficiency. There are several ways to stratify the sampling frame considering all stratification variables, such as using s...
متن کاملComparison of methods for correcting population stratification in a genome-wide association study of rheumatoid arthritis: principal-component analysis versus multidimensional scaling
Population stratification (PS) represents a major challenge in genome-wide association studies. Using the Genetic Analysis Workshop 16 Problem 1 data, which include samples of rheumatoid arthritis patients and healthy controls, we compared two methods that can be used to evaluate population structure and correct PS in genome-wide association studies: the principal-component analysis method and ...
متن کاملInferences from genomic models in stratified populations.
Unaccounted population stratification can lead to spurious associations in genome-wide association studies (GWAS) and in this context several methods have been proposed to deal with this problem. An alternative line of research uses whole-genome random regression (WGRR) models that fit all markers simultaneously. Important objectives in WGRR studies are to estimate the proportion of variance ac...
متن کاملAccounting for population stratification in DNA methylation studies.
DNA methylation is an important epigenetic mechanism that has been linked to complex diseases and is of great interest to researchers as a potential link between genome, environment, and disease. As the scale of DNA methylation association studies approaches that of genome-wide association studies, issues such as population stratification will need to be addressed. It is well-documented that fa...
متن کامل